Research for Uyghur-Chinese Neural Machine Translation

نویسندگان

  • Jinying Kong
  • Yating Yang
  • Xi Zhou
  • Lei Wang
  • Xiao Li
چکیده

The problem of rare and unknown words is an important issue in Uyghur-Chinese machine translation, especially using neural machine translation model. We propose a novel way to deal with the rare and unknown words. Based on neural machine translation of using pointers over input sequence, our approach which consists of preprocess and post-process can be used in all neural machine translation model. Pre-process modify the Uyghur-Chinese corpus to extend the ability of pointer network, and the postprocess retranslating the raw translation by a phrase-based machine translation model or a wordlist. Experiment show that neural machine translation model used the approach proposed by this paper get a higher BLEU score than the phrase-based model in Uyghur-Chinese MT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Factor-Based Uyghur-Chinese Statistical Machine Translation

This paper is an initial explore to Uyghur-Chinese statistical machine translation. Uyghur and Chinese are very different from each other, the former is an agglutinative language with very productive inflectional and derivational word-formation processes, but the characters of the latter are almost hieroglyphics, morpheme processing doesn’t work at all. We integrate Uyghur additional informatio...

متن کامل

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation

In statistical machine translation, large amount of unreasonable phrase pairs in a phrase table can affect the decoding efficiency and the overall translation performance, especially in Uyghur-Chinese machine translation. In this paper, we present a novel phrase table filtering model based on binary classification, which consider differences between Uyghur and Chinese, and draw lessons from bin...

متن کامل

Uyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition

This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...

متن کامل

Rule Based Analysis of the Uyghur Nouns

This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...

متن کامل

Log-linear Models for Uyghur Segmentation in Spoken Language Translation

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random fiel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016